home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Experimental BBS Explossion 3
/
Experimental BBS Explossion III.iso
/
games
/
nhak_src.zip
/
MAINTAIN.OVL
< prev
next >
Wrap
Text File
|
1993-03-16
|
18KB
|
337 lines
Maintaining PC NetHack
========================
Last revision: 1990may27
The installation of the system of overlay management that currently
brings full-featured NetHack to the IBM PC and compatibles has
introduced a number of arcanities into the source code of the
programme, and unfortunately running afoul of these intricacies can
result (as we ourselves have discovered) in the most bizarre and
strangely inexplicable dysfunctional manifestations, aka sick bugs.
This document is required reading for anyone making substantive
changes to NetHack for the PC or embarking upon a revision of its
overlay structure.
1. The overlay manager
----------------------
NetHack is by now a fairly large programme (in excess of 800
kilobytes), and in order to compile it for the PC (which typically
has little more than 500k of available memory) it was necessary to
rely on the technique of _overlaying_, whereby not all the
programme is resident in memory at the same time, segments of the
programme being loaded and discarded as they are needed. Unlike
traditional candidates for the overlaying strategy, however, NetHack
does not exhibit strongly phased behaviour; although much of the code
is not being used at any one moment, there is comparatively little
code that can be confidently said not to be related to or potentially
necessary for the immediate progress of the game.
Furthermore we wished to develop an overlaying strategy that
did _not_ involve intimate knowledge of the operation of the
programme (since NetHack is an international team effort, and few
people have a good feeling for the totality of the code structure),
and which would not require substantive changes to the source code,
impacting on its maintainability and portability.
It turned out to be impossible to satisfy these goals with
tools that are widely available at the time of writing, and so we
undertook to write our own overlay manager (compatible with
Microsoft's, but more in concert with NetHack's particular needs).
The result is called ovlmgr.asm and is documented in the file
ovlmgr.doc. You would probably be well advised to read at least the
less technical parts of that file now.
2. The trampoli mechanism
-------------------------
One of the difficulties with using overlays for C (particularly
Microsoft C) is that while common C programming practise places heavy
reliance on function pointers, Microsoft's overlay linker is unable
to resolve calls through pointers to functions that are in remote
overlays. Nor, unfortunately, does it choose to report such failures;
rather, it generates calls into (what often turns out to be in the
case of our nonstandard overlay manager) the deepest of space. This
can result in truly strange behaviour on the part of your programme -
including bugs that come and go in as close to a random pattern as
you are ever likely to see.
Other than the creative use of pattern-matching utilities
such as grep to locate the offending calls, there is unfortunately no
advice we can offer in tracking down these bugs. Once they have been
isolated, however, they can be remedied straightforwardly.
In order for the linker not to screw up on a pointered function call
it is (to simplify an actually rather complicated situation)
necessary that the function called be located in the ROOT "overlay",
and thus not be subject to swapping. Rather than linking the full
text of every pointered function into the root, however, it suffices
to place a "trampoline" function there which performs a direct call
to the "real" function that does the work, in whatever overlay it
might naturally reside in. Due to a not-quite-accident of the
behaviour of the C preprocessor (it was originally intended to make
possible functions whose address can be taken but which expand inline
as macros where possible, a not unrelated function), it turns out to
be possible to arrange for this without major change to the C source
code - and without serious impact on the performance of "regular"
calls to the same functions.
The C preprocessor's expansion of a macro with parameters is triggered
by an encounter with the macro name immediately followed by an open
parenthesis. If the name is found, but it is not followed by a
parenthesis, the macro is not matched and no expansion takes place.
At the same time it may be noted that (unless someone has been oddly
strange and enclosed a function name in quite unneeded parentheses!),
a function name is typically followed by an open parenthesis if, and
only if, it is being declared, defined or invoked; if its address is
being taken it will necessarily be followed by some other token.
Furthermore (except in the unfortunate case of the ill-conceived
new-style ANSI declaration of a function that takes no parameters) it
will be observed that the number of parameters to a call of the
function (assuming that this number is fixed; if not, I grant, we
have a problem) is the same in all these contexts. This implies that
if all the modules of a programme are uniformly processed in the
context of a macro definition such as
#define zook(a,b) plenk(a,b)
and assuming that all functions named zook() take exactly two
arguments, then the resulting programme will be completely identical
to the original (without this definition) except that the link
map will report the existence of the function plenk() in place of
zook() -- UNLESS there was a place in the programme where the address
of zook was taken. In that case, the linker would report an
unresolved external reference for that symbol.
That unresolved reference is, of course, precisely what we
need; if in another source file (one that did not see the macro
definition) we placed the function declaration
some_t zook(this_t a, that_t b)
{ extern some_t plenk(this_t, that_t);
return plenk(a, b);
}
this would both satisfy the unresolved reference and restore the
original semantics of the programme (even including pointer
comparison!) -- while providing us with precisely the kind of
"trampoline" module that we need to circumvent the problem with the
linker.
This is the basis of the approach we have taken in PC
NetHack; rather than using the somewhat idiosyncratic identifier
"plenk", however, we have systematically employed (in the files
trampoli.h and trampoli.c) identifiers generated by appending
underscores to the ends of the names of the functions we have needed
to so indirect(1).
There are a few small complications. The first is ensuring that both
the versions of the trampoli'd function (foo() and foo_()) are
similarly typed by the appropriate extern declarations (which
themselves must be written); this can be accomplished by placing all
of these declarations in a header file that is processed _twice_,
once before and once after the inclusion of the file containing the
trampoli macro definitions, thereby ensuring that both variants of
the name have been seen in connection with the appropriate types. The
second is that some care must be exercised not to employ other macros
that interfere with the normal recognition of function syntax: it is
the presence of the open parenthesis after the name of the function
that triggers name substitution, and not the fact that the function
is called; and so (particularly in the case of declarations) it is
necessary that if a macro is used to supply the _arguments_ of a
trampoli'd function, it must also supply the name (this necessity in
fact triggered a change in the style of the macros that provide
dialect-independent function declaration in NetHack; the new style
would have you write FDECL(functionName, (argTypes...)).
Finally, there is the case of functions declared to take no
arguments whatsoever; in Standard C this is notated:
some_t aFunction(void);
for no theoretically well-motivated reason I can discern. Such a
declaration will _not_ match a macro definition such as
#define aFunction() aFunction_()
-- in fact the compiler will detect an error when processing that
declaration in the scope of this macro. The only solution is to
eschew the use of this strange syntax and unfrabjously forgo the
concomitant security of well- and thoroughly- checked typage. To
which end we have provided an ecchy macro, NDECL(functionName), which
uses the new syntax _unless_ the compiler is not Standard or OVERLAY
is enabled.
There is one further consideration: that this technique only applies,
of course, to functions that are published to the linker. For this
reason, wherever such trampoli'd functions were originally declared
static, that declaration has been changed to "STATIC_PTR", a macro
that expands to "static" unless the OVERLAY flag has been selected in
the configuration file, enabling the trampoli mechanism. Thus such
functions lose their privacy in this one case.
3. OVLx
-------
The strategies described above work fine, but they only stretch so
far. In particular, they do not admit of an overlay structure in
which functions are linked into different overlays even though they
originate in the same source file.
Classically, this is not considered a real limitation,
because one has the freedom to regroup the functions into different
source files as needed; however, in the case of NetHack this was not
a realistic option, since what structure this unwieldy programme has
is precisely in the current grouping of functions together.
Nonetheless, the ability to perform some functional grouping is an
absolute requirement for acceptable performance, since many NetHack
source modules (were.c, for example) contain one or two tiny
functions that are called with great frequency (several millions of
times per game is not unheard of) and whose return value determines
whether the remaining large, slow functions of the file will be
required at all in the near future. Obviously these small checking
functions should be linked into the same overlays with their callers,
while the remainder of the source module should not.
In order to make this possible we ran a dynamic profile on the game
to determine exactly which functions in which modules required such
distinguished treatment, and we have flagged each function for
conditional compilation (with #if ... #endif) in groups according
approximately to their frequency of invocation and functionality.
These groups have been arbitrarily named in each source file (in
decreasing order of frequency), OVL0, OVL1, OVL2, OVL3 and OVLB (B
for "base functions", those that deserve no special treatment at
all). It is thus possible to compile only a small number of the
functions in a file by defining but one or two of these symbols on
the compiler's command line (with the switch /DOVL2, for example);
the compiler will ignore the remainder as if they did not exist.
(There is an "escape clause" in hack.h that ensures that if none of
these flags is defined on the command line, then all of them will be
during compilation; this makes the non-use of this mechanism
straightforward!)
By repeated invocation of the compiler on the _same_ source
file it is possible to accumulate disjoint object modules that
between them contain the images of all the functions in the original
source, but partitioned as is most convenient. Care must, of course,
be taken over conflicts of name in both the object file put out (all
slices will by default be called SRCFILE.OBJ, and this default must
be overridden with distinct file names for each output slice) and in
the names of the text segments the compiler is to generate; you can
see this at work in Makefile.ovl. (You may wonder, as we did at
first, why the text segment name would have to be made distinct in
each object file slice (the default segment name is a function of the
source file name and the compilation model only). The reason for this
is, quite daftly to my mind, that the linker considers the identity
of segment names and combine classes better reason to combine
segments than the programmer's explicit instructions in the requested
overlay pattern are reason to keep them apart. Programmer, ask not
why...).
Once again, that works fine except for the small matter of
declarations (where have we heard this before?). For objects that
once were static must now be made visible to the linker that they may
be resolved across the reaches of inter-overlay space. To this end we
have provided three macros, all of which expand simply to "static" if
no OVLx flags are defined on the compilation command line. They are:
STATIC_DCL which introduces a declaration (as distinct from a
definition) of an object that would be static were it
not for the requirements of the OVLx mechanism. Its
expansion is "static", normally, but it becomes
"extern" in the event that this source file has been
split into slices with the OVLx mechanism.
STATIC_OVL is used when _defining_ a function (giving its text,
that is) that is logically static but may be called
across slices; it expands to "static" unless OVLx is
active; in the latter case it expands to null,
leaving the function with "previous linkage" as the
standard says. Note that this behaviour is quite
similar to, but very different from, that of
STATIC_PTR (described above), which has the same two
expansions but which is triggered not by OVLx but by
the OVERLAY flag which enables the trampoli mechanism.
STATIC_OVL also differs from the STATIC_DCL
and STATIC_VAR in that it is employed _within_ OVLx
slices, while the others are used to generate
declarations and are deployed in areas common to all
slices.
STATIC_VAR is used to introduce uninitialised would-be-static
variables. Its expansion is complex, since it must
read as "static" in the usual case, but as "extern"
if OVLx is in use -- in all overlays but one, where
it must expand to the null sequence -- giving it
"previous linkage" and "tentative definition" (to
ensure that the variable gets defined at all).
This one took a while to get right, and
believe me, using the macro is a lot easier than
trying to keep the #ifdefs straight yourself!
An initialised variable that is file-level static unless OVLx is in
use must now be written with a STATIC_DCL declaration, and a
definition (and static initialiser) enclosed within the bracketing
tag of one of the OVLx slices (any will do; we use OVLB).
Type definitions, macro definitions and extern declarations
should, of course remain outside any OVLx slice.
Finally, of course, objects whose visibility need not be extended may
safely continue to be declared static. And in this case, at least,
the compiler will provide diagnostics that inform you when an object
has slipped through the cracks and requires the application of Magic
Macro Salve.
It is perhaps less than obvious that when a function is _both_ called
across an OVLx split and referenced through a pointer, it should be
treated as a pointered function (that is, it should get trampoli
entries and should be defined STATIC_PTR). The reason for this is that
the STATIC_xxx macros associated with OVLx _only_ change the
declaration patterns of the objects, while trampoli results in the
generation of necessary code.
It is correct to do this, because the declarations produced by
STATIC_PTR are triggered by OVERLAY's being defined, and the selection
of OVERLAY is an absolute precondition for the activation of OVLx.
4. Hacking
----------
Before undertaking any serious modifications to the overlay structure
or support mechanisms, you should know that a _lot_ of work has gone
into the current scheme. If performance seems poor, remember: the
overlay manager itself can be invoked up to ten thousand times in a
second, and although the space available for loading overlays (once
the data and stack spaces have been accounted for) is less than half
the total size of the overlays that are swapped through it, a disk
access occurs well under 0.1% of the time(2). Furthermore, this
performance (such as it is) has been achieved without substantive
change or restructuring of the NetHack source code, which must remain
portable to many platforms other than the PC.
If these observations do not daunt you, you are a true Bit Warrior
indeed (or aspiration anyway), and we await your comments with bait.
------------------------------------------------------------------------
NOTES:
------
(1) In fact, we have applied this technique throughout NetHack, even
in cases where it is not strictly necessary (since the pointered
calls are not across overlay splits, for example - though note
that there are more splits than might be initially apparent, due
to the effects of the OVLx hackage as described in section 3).
There is, however, one exception; and beware: it is an exception
with fangs. The file termcap.c contains a few pointered functions
that we decided _not_ to trampoli for performance reasons (screen
output is one of the problem areas on the PC port at the moment,
in terms of performance). It is therefore vital to the health of
PC NetHack as it currently stands that the OVLx slice termcap.0 be
linked into the ROOT "overlay".
(2) These figures are for a 4.77 MHz PC-XT running in low memory with
an older version of both the overlay manager and the NetHack
overlay arrangement. On a more capable computer and with the
current software, the figures are probably more like a 100kHz peak
service rate and a hit rate (since we fixed the bug in the LRU
clock logic!) in excess of 99.99% -- hopefully not both at the
same time.
------------------------------------------------------------------------
Stephen P Spackman stephen@tira.uchicago.edu
------------------------------------------------------------------------
* Hack On! *